Conceptual Excercises

Applied Exercises

4. Generate a simulated two-class data set with 100 observations and two features in which there is a visible but non-linear separation between the two classes. Show that in this setting, a support vector machine with a polynomial kernel (with degree greater than 1) or a radial kernel will outperform a support vector classifier on the training data. Which technique performs best on the test data? Make plots and report training and test error rates in order to back up your assertions.

set.seed(10)
x<-matrix(rnorm(100*2), ncol=2)
x[1:50,]<-x[1:50,]+2
x[51:75,]<-x[51:75]-2
y<-c(rep(1,75),rep(2,25))
plot(x, col=y)

dat<-data.frame(x=x,y=as.factor(y)) # encode response as factor
train<-sample(100,50)

By plotting the results, we can see if the classes are linearly separable. They do not appear so. We will now apply a Support Vector Classifier.

library(e1071)
svmfit<-svm(y~.,data=dat[train,],kernel="linear",cost=10,scale=FALSE)
plot(svmfit,dat)

Tune to perform cross-validation and store the best model.

set.seed(1)
tune.out<-tune(svm,y~.,data=dat[train,],kernel="linear",ranges=list(cost=c(0.001,0.01,0.1,1,5,10,100)))
bestmod<-tune.out$best.model

Now we can predict the class label on a set of test observations.

table(true=dat[-train,"y"],pred=predict(bestmod,newx=dat[-train,]))
##     pred
## true  1  2
##    1 39  0
##    2 11  0

In this case, 20% of the observations are classified incorrectly.

Moving on to Support Vector Machine. Fit with a radial kernel.

svmfit<-svm(y~.,data=dat[train,],kernel="radial",gamma=1,cost=1)
plot(svmfit,dat[train,])

plot(svmfit,dat[-train,])

This is cool. It shows an apparent non-linear boundary. Now, let’s tune.

tune.out<-tune(svm,y~.,data=dat[train,],kernel="radial",ranges=list(cost=c(0.1,1,10,100,1000),gamma=c(0.5,1,2,3,4)))
summary(tune.out)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1     2
## 
## - best performance: 0.06 
## 
## - Detailed performance results:
##     cost gamma error dispersion
## 1  1e-01   0.5  0.28 0.25298221
## 2  1e+00   0.5  0.12 0.16865481
## 3  1e+01   0.5  0.10 0.14142136
## 4  1e+02   0.5  0.10 0.14142136
## 5  1e+03   0.5  0.12 0.10327956
## 6  1e-01   1.0  0.28 0.25298221
## 7  1e+00   1.0  0.12 0.13984118
## 8  1e+01   1.0  0.10 0.14142136
## 9  1e+02   1.0  0.10 0.10540926
## 10 1e+03   1.0  0.16 0.15776213
## 11 1e-01   2.0  0.28 0.25298221
## 12 1e+00   2.0  0.06 0.09660918
## 13 1e+01   2.0  0.06 0.09660918
## 14 1e+02   2.0  0.10 0.14142136
## 15 1e+03   2.0  0.18 0.17511901
## 16 1e-01   3.0  0.28 0.25298221
## 17 1e+00   3.0  0.08 0.10327956
## 18 1e+01   3.0  0.06 0.09660918
## 19 1e+02   3.0  0.12 0.10327956
## 20 1e+03   3.0  0.12 0.10327956
## 21 1e-01   4.0  0.28 0.25298221
## 22 1e+00   4.0  0.10 0.14142136
## 23 1e+01   4.0  0.06 0.09660918
## 24 1e+02   4.0  0.12 0.10327956
## 25 1e+03   4.0  0.12 0.10327956

It appears the best cost is 1 and the best gamma is 0.5. Now time to predict!

table(true=dat[-train,"y"],pred=predict(svmfit,newx=dat[-train,]))
##     pred
## true  1  2
##    1 30  9
##    2  8  3

In this case, 44% of the test observations are misclassified. I’m confused why the SVM is being out-performed by the SVC since the data appears to have a non-linear boundary, which the SVM accounts for but the SVC does not.